The adaptive computationally-scalable motion estimation\nalgorithm and its hardware implementation allow the\nH.264/AVC encoder to achieve efficiencies close to optimal in\nreal-time conditions. Particularly, the search algorithm\nachieves results close to optimumeven if the number of search\npoints assigned to macroblocks is strongly limited and varies\nwith time. The architecture implementing the algorithmdeveloped\nand reported previously takes at least 674 clock cycles to\ninterpolate and load reference area, and the number cannot be\ndecreased without decreasing the search range. This paper\nproposes some optimizations of the architecture to increase\nthe maximal throughput achieved by the motion estimation\nsystem even four times. Firstly, the chroma interpolation follows\nthe search process, whereas the luma interpolation precedes\nit. Secondly, the luma interpolator computes 128 instead\nof 64 samples per each clock cycle. Thirdly, the number of onchip\nmemories keeping interpolated reference area is increased\naccordingly to 128. Fourthly, somemodules previously\nworking at the base frequency are redesigned to operate at\nthe doubled clock. Since the on-chip memories do not store\nfractional-pel chroma samples, their joint size is reduced from\n160.44 to 104.44 kB. Additional savings in the memory size\nare achieved by the sequential processing of two referencepicture\nareas for each macroblock. The architecture is verified\nin the real-time FPGA hardware encoder. Synthesis\nresults show that the updated architecture can support\n2160p@30fps encoding for 0.13 ?m TSMC technology with\na small increase in hardware resources and some losses in the\ncompression efficiency. The efficiency is improved when processing\nsmaller resolutions.
Loading....